NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Jaiswal, Ajay; Wang, Yifan; Yin, Lu; Liu, Shiwei; Chen; Runjin; Zhao, Jiawei; Grama, Ananth; Tian, Yuandong; Wang, Zhangyang (July 2025, International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available July 13, 2026
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Zhao, Jiawei; Zhang, Zhenyu; Chen, Beidi; Wang, Zhangyang; Anandkumar, Anima; Tian, Yuandong (July 2024, International Conference on Machine Learning (ICML))

Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank adaptation (LoRA), add a trainable low-rank matrix to the frozen pre-trained weight in each layer, reducing trainable parameters and optimizer states. However, such approaches typically underperform training with full-rank weights in both pre-training and fine-tuning stages since they limit the parameter search to a low-rank subspace and alter the training dynamics, and further, may require full-rank warm start. In this work, we propose Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods such as LoRA. Our approach reduces memory usage by up to 65.5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19.7B tokens, and on fine-tuning RoBERTa on GLUE tasks. Our 8-bit GaLore further reduces optimizer memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline. Notably, we demonstrate, for the first time, the feasibility of pre-training a 7B model on consumer GPUs with 24GB memory (e.g., NVIDIA RTX 4090) without model parallel, checkpointing, or offloading strategies.
more » « less
Full Text Available
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Zhao, Jiawei; Zhang, Zhenyu; Chen, Beidi; Wang, Zhangyang; Anandkumar, Anima; Tian, Yuandong (July 2024, International Conference on Machine Learning (ICML))

Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank adaptation (LoRA), add a trainable low-rank matrix to the frozen pre-trained weight in each layer, reducing trainable parameters and optimizer states. However, such approaches typically underperform training with full-rank weights in both pre-training and fine-tuning stages since they limit the parameter search to a low-rank subspace and alter the training dynamics, and further, may require full-rank warm start. In this work, we propose Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods such as LoRA. Our approach reduces memory usage by up to 65.5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19.7B tokens, and on fine-tuning RoBERTa on GLUE tasks. Our 8-bit GaLore further reduces optimizer memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline. Notably, we demonstrate, for the first time, the feasibility of pre-training a 7B model on consumer GPUs with 24GB memory (e.g., NVIDIA RTX 4090) without model parallel, checkpointing, or offloading strategies.
more » « less
Full Text Available
Shock impedance amplified impact deformation of zircon in granitic rocks from the Chicxulub impact crater

https://doi.org/10.1016/j.epsl.2021.117201

Wittmann, Axel; Cavosie, Aaron J.; Timms, Nicholas E.; Ferrière, Ludovic; Rae, Auriol; Rasmussen, Cornelia; Ross, Catherine; Stockli, Daniel; Schmieder, Martin; Kring, David A.; et al (December 2021, Earth and Planetary Science Letters)
null (Ed.)
Full Text Available
Evidence of Carboniferous arc magmatism preserved in the Chicxulub impact structure

https://doi.org/10.1130/B35831.1

Ross, Catherine H.; Stockli, Daniel F.; Rasmussen, Cornelia; Gulick, Sean P.S.; de Graaff, Sietze J.; Claeys, Philippe; Zhao, Jiawei; Xiao, Long; Pickersgill, Annemarie E.; Schmieder, Martin; et al (April 2021, GSA Bulletin)
null (Ed.)
Determining the nature and age of the 200-km-wide Chicxulub impact target rock is an essential step in advancing our understanding of the Maya Block basement. Few age constraints exist for the northern Maya Block crust, specifically the basement underlying the 66 Ma, 200 km-wide Chicxulub impact structure. The International Ocean Discovery Program-International Continental Scientific Drilling Program Expedition 364 core recovered a continuous section of basement rocks from the Chicxulub target rocks, which provides a unique opportunity to illuminate the pre-impact tectonic evolution of a terrane key to the development of the Gulf of Mexico. Sparse published ages for the Maya Block point to Mesoproterozoic, Ediacaran, Ordovician to Devonian crust are consistent with plate reconstruction models. In contrast, granitic basement recovered from the Chicxulub peak ring during Expedition 364 yielded new zircon U-Pb laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS) concordant dates clustering around 334 ± 2.3 Ma. Zircon rare earth element (REE) chemistry is consistent with the granitoids having formed in a continental arc setting. Inherited zircon grains fall into three groups: 400−435 Ma, 500−635 Ma, and 940−1400 Ma, which are consistent with the incorporation of Peri-Gondwanan, Pan-African, and Grenvillian crust, respectively. Carboniferous U-Pb ages, trace element compositions, and inherited zircon grains indicate a pre-collisional continental volcanic arc located along the Maya Block’s northern margin before NW Gondwana collided with Laurentia. The existence of a continental arc along NW Gondwana suggests southward-directed subduction of Rheic oceanic crust beneath the Maya Block and is similar to evidence for a continental arc along the northern margin of Gondwana that is documented in the Suwannee terrane, Florida, USA, and Coahuila Block of NE México.
more » « less
Full Text Available
Geochemistry, geochronology and petrogenesis of Maya Block granitoids and dykes from the Chicxulub Impact Crater, Gulf of México: Implications for the assembly of Pangea

https://doi.org/10.1016/j.gr.2019.12.003

Zhao, Jiawei; Xiao, Long; Gulick, Sean P.S.; Morgan, Joanna V.; Kring, David; Fucugauchi, Jaime Urrutia; Schmieder, Martin; de Graaff, Sietze J.; Wittmann, Axel; Ross, Catherine H.; et al (June 2020, Gondwana Research)

Full Text Available

Search for: All records